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Abstract 


Non-canonical base pairs are planar hydrogen bonded pairs of nucleobases, having hydrogen bonding patterns 
which differ from the patterns observed in Watson-Crick base pairs, as in the classic double helical DNA. The struc- 
tures of polynucleotide strands of both DNA and RNA molecules can be understood in terms of sugar-phosphate 
backbones consisting of phosphodiester-linked D 2’ deoxyribofuranose (D ribofuranose in RNA) sugar moieties, 
with purine or pyrimidine nucleobases covalently linked to them. Here, the Ng atoms of the purines, guanine and 
adenine, and the N12 atoms of the pyrimidines, cytosine and thymine (uracil in RNA), respectively, form glycosidic 
linkages with the C1’ atom of the sugars. These nucleobases can be schematically represented as triangles with 
one of their vertices linked to the sugar, and the three sides accounting for three edges through which they can 
form hydrogen bonds with other moieties, including with other nucleobases. As also explained in greater details 
later in this article, the side opposite to the sugar linked vertex is traditionally called the Watson-Crick edge, since 
they are involved in forming the Watson-Crick base pairs which constitute building blocks of double helical DNA. 
The two sides adjacent to the sugar-linked vertex are referred to, respectively, as the Sugar and Hoogsteen (C-H 
for pyrimidines) edges. 


Each of the four different nucleobases are characterized by distinct edge-specific distribution patterns of their 
respective hydrogen bond donor and acceptor atoms, complementarity with which, in turn, define the hydrogen 
bonding patterns involved in base pairing. The double helical structures of DNA or RNA are generally known to 
have base pairs between complementary bases, Adenine: Thymine (Adenine:Uracil in RNA) or Guanine:Cytosine. 
They involve specific hydrogen bonding patterns corresponding to their respective Watson-Crick edges, and are 
considered as Canonical Base Pairs. At the same time, the helically twisted backbones in the double helical duplex 
DNA form two grooves, major and minor, through which the hydrogen bond donor and acceptor atoms corre- 
sponding respectively to the Hoogsteen and sugar edges are accessible for additional potential molecular recog- 
nition events. Experimental evidences reveal that the nucleotide bases are also capable of forming a wide variety 
of pairing between bases in various geometries, having hydrogen bonding patterns different from those observed 
in Canonical Base Pairs (Figure 1). These base pairs, which are generally referred to as Non-Canonical Base Pairs, 
are held together by multiple hydrogen bonds, and are mostly planar and stable. Most of these play very important 
roles in shaping the structure and function of different functional RNA molecules. In addition to their occurrences 
in several double stranded stem regions, most of the loops and bulges that appear in single-stranded RNA sec- 
ondary structures form recurrent 3D motifs, where non-canonical base pairs play a central role. Non-canonical 
base pairs also play crucial roles in mediating the tertiary contacts in RNA 3D structures. 
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Figure 1| Examples of few frequently observed non-canonical base pairs, (a) Adenine: Guanine trans Hoogsteen/Sugar-edge, (b) 
Adenine: Uracil trans Hoogsteen/Watson-Crick, (c) Guanine: Guanine cis Watson-Crick/Hoogsteen; (d)Protonated Cytosine (+): 
Cytosine trans Watson-Crick/Watson-Crick 
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Figure 2 | |UPAC-IUB recommended nomenclature of nucleotide base atoms of Adenine, Guanine, Uracil and Cytosine bases 


(created in MOLDEN). 


Double helical structures of DNA as well as folded single 
stranded RNA are now known to be stabilized by Watson- 
Crick base pairing between the purines, Adenine and Guanine, 
with the pyrimidines, Thymine (or Uracil for RNA) and Cyto- 
sine. In this scheme, the N1 atoms of the purine residues re- 
spectively form hydrogen bond with the N3 atoms of the py- 
rimidine residues in A:T and G:C complementarity (see Figure 
2 for atom labeling scheme according to |UPAC-IUB conven- 
tion). The second hydrogen bond in A:T base pairs involves 
the N6 amino group of Adenine and the O4 atom of Thymine 
(or Uracil in RNA). Similarly, the second hydrogen bond in G:C 
base pairs involves O6 atom and N4 amino group of Guanine 
and Cytosine, respectively. The G:C base pairs also have a 
third hydrogen bond involving the N2 amino group of Guanine 
and the O2 atom of Cytosine. However, even till about twenty 
years after this scheme was initially proposed by James D. 
Watson and Francis H.C. Crick," experimental evidences sug- 
gesting other forms of base-base interactions continued to 
draw the attention of researchers investigating the structure 
of DNA.ZIG! The first high resolution structure of a Ade- 
nine: Thymine base pair, as solved by Karst Hoogsteen by sin- 
gle crystal X-ray crystallography in 1959"! revealed a structure 
whose geometry was very different from what was proposed 
by Watson and Crick. It had two hydrogen bonds involving N7 
and N6 atoms of Adenine and N3 and O4 (or O2) atoms of 
Thymine, respectively (Figure 1b and 2). It may be noted that 
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due to use of Thymine base with methyl! group representing 
sugar, a symmetry axis appears passing through N1 and Cé at- 
oms and the O2 and O4 atoms appears identical. In order to 
distinguish this alternate base pairing scheme from the Wat- 
son-Crick scheme, base pairs where a hydrogen bond involves 
the N7 atom of a purine residue have been referred to as 
Hoogsteen base pair, and later, the purine base edge which 
includes its N7 atom is referred to as its Hoogsteen edge. The 
first high resolution structure of Guanine:Cytosine pair, ob- 
tained by W. Guschelbauer also was similar to the Hoogsteen 
base pair, although this structure required an unusual proto- 
nation of N1 imino nitrogen of Cytosine, which is possible only 
at significantly lower pH.) Experimental evidences, including 
low resolution NMR studies"! as well as high resolution X-ray 
crystallographic studies,” supporting Watson-Crick base 
pairing were obtained as late as in the early '70s. Almost a dec- 
ade later, with the advent of efficient DNA synthesis meth- 
ods, Richard Dickerson"! followed by several other groups, 
solved structures of the physiological double helical B-DNA 
with a complete helical turn, based on the crystals of synthetic 
DNA oligomers.@@20°20 The pairing geometries of the A:T 
(A:U in RNA) and G:C pairs in these structures confirmed the 
common or canonical form of base pairing as proposed by 
Watson and Crick, while those with all other geometries, and 
compositions, are now referred to as non-canonical base 
pairs. 
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It was noticed that even in double stranded DNA, where ca- 
nonical Watson Crick base pairs associate the two comple- 
mentary anti-parallel strands together, there were occasional 
occurrences of Hoogsteen and other non-Watson-Crick base 
pairs.0210312415196107] |t was also proposed that within Wat- 
son-Crick base pair dominated DNA double helices, 
Hoogsteen base pair formation could be a transient phenom- 
enon] 

While canonical Watson-Crick base pairs are most prevalent 
and are commonly observed in a majority of chromosomal 
DNA and in most functional RNAs, presence of stable non-ca- 
nonical base pairs is also extremely significant in DNA biology. 
An example of non-Watson-Crick, or non-canonical, base 
pairing can be found at the ends of chromosomal DNA. The 
3'-ends of chromosomes contain single stranded overhangs 
with some conserved sequence motifs (such as TTAGGG in 
most vertebrates). The single stranded region adopts some 
definite three-dimensional structures, which has been solved 
by X-ray crystallography as well as by NMR spectros- 
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copy.481[19120] The single strands containing the above se- 
quence motifs are found to form interesting four stranded 
mini-helical structures stabilized by Hoogsteen base pairing 
between Guanine residues. In these structures, four Guanine 
residues form a near planar base quartet, referred to as G- 
quadruplex, where each Guanine participates in base pairing 
with its neighboring Guanine (Figure 3 and 4), involving their 
Watson-Crick and Hoogsteen edges in a cyclic manner. The 
four central carbonyl groups are often stabilized by potassium 
ions (K*). From the full genomic sequences of different organ- 
isms, it has been observed that telomere like sequences 
sometimes also interrupt double helical regions near tran- 
scription start site of some oncogenes, such as c-myc. It is pos- 
sible that these sequence stretches form G-quadruplex like 
structures, which can suppress the expression of the related 
genes. The complementary Cytosine rich sequences, on the 
other strand, may adopt another similar four stranded struc- 
ture, the i-motif, stabilized by Cytosine:Cytosine non-canoni- 
cal base pairs. 


Figure 3 | Structure of a representative G-Quadruplex consisting of Hoogsteen base pairs between every neighboring Guanine 


residues (PDB: 1KF1) 
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Figure 4 | Three G-quadruplexes stack to form four stranded telomere with different topologies for 


d(GGGATTGGGATTGGGATTGGG) sequence 


While non-canonical base pairs are still relatively rare in 
DNA, in RNA molecules, where generally a single poly- 
meric strand folds onto itself to form various secondary 
and tertiary structures, the occurrence of non-Watson- 
Crick base pairs turns out to be far more prevalent. As 
early as inthe 1970's, analysis of the crystal structure of 
Yeast tRNA" showed that RNA structures possess sig- 
nificant non-canonical variations in base pairing 
schemes. Subsequently, the structures of ribozymes, ri- 
bosome, riboswitches, etc. have highlighted their abun- 
dance, and hence the need for a comprehensive charac- 
terization of Non-Canonical Base Pairs. These three-di- 
mensional RNA structures generally possess several 
secondary structural motifs, such as double helical 
stems, stems with hairpin loops, symmetric and asym- 
metric internal loops, kissing loops between two hairpin 
motifs, pseudoknots, continuous stacks between two 
segments of helices, multi helix junctions?47! etc. 
along with single stranded regions. These secondary 
structural motifs, except for the single stranded motifs, 
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are stabilized by hydrogen bonded base pairs and sev- 
eral of these are non-canonical base pairs, including G:U 
Wobble base pairs. 


It is notable in this context, that the Wobble hypothesis 
of Francis Crick predicted the possibility of G:U base 
pair, in place of the canonical G:C or A:U base pairs, also 
mediating the recognition between mRNA codons and 
tRNA anticodons, during protein synthesis. Today, as 
can be seen in the corresponding Wiki page on wooble 
base pair, the G:U wobble base pair is the most numer- 
ously observed non-canonical base pair. While, because 
of its geometric similarity with the canonical base pairs, 
they frequently occur in the double helical stem regions 
of RNA structures, the geometric differences continue 
to draw the attention of nucleic acid researchers, 
providing new insights related to its structural signifi- 
cance. It may be noted that though the base pairs in the 
folded RNA structures, give rise to double helical stems, 
its two cleft regions — the major groove and minor 
groove, differ in their respective dimensions from those 
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in DNA double helices. Unlike for those in DNA, the se- 
quence discriminating major grooves in RNA double 
helices are very narrow and deep. On the other hand the 
minor groove regions, though wide and shallow, do not 
carry much sequence specific information in terms of 
the hydrogen bonding donor-acceptor positioning of 
the corresponding base pair edges.'”*! The G:U wobble 
base pairs, along with the various other non-canonical 
base pairs, introduce variations in the structures of RNA 
double helices, thus enhancing the accessibility of the 
discriminating major groove edges of associated base 
pairs. This has been seen to be very important for mo- 
lecular recognition steps during tRNA aminoacylation 
as well as in ribosome functions. !“! 


Considering the immense importance of the non-ca- 
nonical base pairs in RNA structure, folding and func- 
tions, researchers from multiple domains — biology, 
chemistry, physics, mathematics, computer science, 
etc., have joined in the effort to understand their struc- 
ture, dynamics, function and their consequences. The 
complexities associated with experimental handling of 
RNA further underline the importance of diverse theo- 
retical inputs towards addressing these issues. 


Classification based on hydrogen bonding 
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Types of Non-canonical Base pairs 


Two bases may approach each other in various ways, 
eventually leading to specific molecular recognition 
mediated by, often non-canonical, base pairing interac- 
tions, in addition to strong stacking interactions. These 
are essential for the process of RNA single strands fold- 
ing into three-dimensional structures. Early studies on 
such unusual base pairs by Jiri Sponer, Pavel Hobza and 
their group were somewhat disadvantaged due to the 
unavailability of suitable unambiguous systematic 
naming schemes.!*! While some of the observed base 
pair were assigned names following the Saenger no- 
menclature scheme.!°! others were arbitrarily assigned 
names by different researchers. It may be mentioned 
that some attempts were also made by Michael Levitt 
and coworkers to classify base-base association in 
terms of adjacency of bases, through either pairing or 
stacking interactions.””) There was clearly a need for a 
classification scheme for different types of non-canoni- 
cal base pairs, which could comprehensively and unam- 
biguously handle newer variants coming up due to the 
rapid increase in the sampling space. Different ap- 
proaches which have evolved in response to this need 
are discussed below. 


Interacting edges Glycosidic bond orientation 


Nomenclature 


Watson-Crick/Watson-Crick Cis 
Watson-Crick/Watson-Crick Trans 
Watson-Crick/Hoogsteen Cis 
Watson-Crick/Hoogsteen Trans 
Watson-Crick/Sugar edge Cis 
Watson-Crick/Sugar edge Trans 
Hoogsteen/Hoogsteen Cis 
Hoogsteen/Hoogsteen Trans 
Hoogsteen/Sugar edge Cis 
Hoogsteen/Sugar edge Trans 
Sugar edge/Sugar edge Cis 
Sugar edge/Sugar edge Trans 


cWW or cis Watson-Crick/Watson-Crick Antiparallel 
tWW or trans Watson-Crick/Watson-Crick Parallel 
cWH or cis Watson-Crick/Hoogsteen Parallel 
tWH or trans Watson-Crick/Hoogsteen Antiparallel 
cWS or cis Watson-Crick/Sugar edge Antiparallel 
tWS or transWatson-Crick/Sugar edge Parallel 
cHH or cis Hoogsteen/Hoogsteen Antiparallel 
tHH or trans Hoogsteen/Hoogsteen Parallel 
cHS or cis Hoogsteen/Sugar edge Parallel 
tHS or trans Hoogsteen/Sugar edge Antiparallel 
cSS or cis Sugar edge/Sugar edge Antiparallel 
tSS or trans Sugar edge/Sugar edge Parallel 
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G:C W:W Cis 


G:C W:W Trans 


Figure 5 | (a) Three hydrogen bonding edges of the four nucleotides (Guanine), showing nomenclature of each edge and (b) Cis 
and Trans orientations of the sugar moieties of the two nucleotide residues glycosidic bonds of a base pair with respect to hydro- 
gen bonding direction. The arrows in (b) indicate glycosidic bonds as vectors. 


The nucleotide bases are nearly planar heterocyclic 
moieties, with conjugated pi-electron cloud, and with 
several hydrogen bonding donors and accepters distrib- 
uted around the edges, usually designated as W, Hor S, 
based on whether the edges can respectively be in- 
volved in forming Watson-Crick base pair, Hoogsteen 
base pair, or, whether the edge is adjacent to the C2’- 
OH group of the ribose sugar. Eric Westhof and Neocles 
Leontis!®] used these edge designations to propose a, 
currently widely accepted, nomenclature scheme for 
base pairs. The hydrogen bonding donor and acceptor 
atoms could thus be classified in terms of their position- 
ing along their three edges, namely the Watson-Crick or 
W edge, the Hoogsteen or H edge, and the Sugar or S 
edge [Figure 5]. Since base pairs are mediated through 
hydrogen bonding interactions based on hydrogen 
bond donor-acceptor complementarity, this, in turn, 
provides a convenient bottoms-up approach towards 
classifying base pair geometries in terms of respective 
interacting edges of the participating bases. It may be 
noted that, unlike the Hoogsteen edge of purines, the 
corresponding edges of the pyrimidine bases do not 
have any polar hydrogen bond acceptor atom such as 
N7. However, these bases have C—H groups at their C6 
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and C5 atoms, which can act as weak hydrogen bond 
donors, as proposed by Gautam Desiraju.°! The 
Hoogsteen edge, hence, is also called Hoogsteen/C-H 
edge in a unified scheme for designating equivalent po- 
sitions of purines as well as pyrimidines. Thus, the total 
number of possible edge combinations involved in base 
pairing are 6, namely Watson-Crick/Watson-Crick (or 
W:W), Watson-Crick/Hoogsteen (or W:H), Watson- 
Crick/Sugar (or W:S), Hoogsteen/Hoogsteen (or H:H), 
Hoogsteen/Sugar (or H:S) and Sugar/Sugar (or S:S). 


In the canonical Watson-Crick base pairs, the glycosidic 
bonds attaching the N9 (of purine) and N1 (of pyrimi- 
dine) of the paired bases with their respective sugar 
moieties, are on the same side of the mean hydrogen 
bonding axis, and are hence called Cis Watson-Crick 
base pairs. However, the relative orientations of the 
two sugars may also be Trans with respect to the mean 
hydrogen bonding direction giving rise to a distinct 
Trans Watson-Crick geometric class, consisting of spe- 
cies which were earlier referred to as reverse Watson- 
Crick base pairs according to Saenger nomenclature. 
The possibility of both Cis and Trans glycosidic bond ori- 
entation for each of the 6 possible edge combinations, 
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gives rise to 12 geometric families of base pairs (Table 
1). 


According to the Leontis-Westhoff scheme, any base 
pair can be systematically and unambiguously named 
using the syntax <Base_1: Base_2><Edge_1: 
Edge_2><Glycosidic Bond Orientation> where Base_1 
and Base_2 carry information on respective base iden- 
tities and their nucleotide number. This nomenclature 
scheme also allows us to enumerate the total number 
of distinct possible base pair types. For a given glyco- 
sidic bond orientation, say Cis, the four naturally occur- 
ring bases each have three possible edges for formation 
of base pairs giving rise to 12 such possible base pairing 
edge identities, each of which can in principle form base 
pairing with any edge of another base, irrespective of 
complementarity. This gives rise to a 12x12 symmetric 
matrix displaying 144 pairwise permutations of base 
pairing edge identities, where, apart from the 12 diago- 
nal entries, others include repeat combinations. Thus, 
there are 78 (= 12 + 132/2) unique entries corresponding 
to the cis glycosidic bond orientation. Considering both 
cis and trans glycosidic bond orientations, the number 
of base pair types amounts to 156. 


Of course, this number 156 is only an indicator. It in- 
cludes base-edge combinations where base pairs can- 
not be formed due to absence of hydrogen bond donor 
acceptor complementarities. For example, potential 
pairing between two Guanine residues utilizing their 
Watson-Crick edges in cis form (cWW) is not supported 
by hydrogen bonding donor-acceptor complementa- 
rity, and is not observed with consistent hydrogen 
bonding pattern. This method of enumerating the pos- 
sible number of distinct base pair types also does not 
consider possibilities of multimodality or bifurcated 
base pairs, or even instances of base pairs involving 
modified bases, protonated bases and water or ion me- 
diation in hydrogen bond formation. Two Cytosine ba- 
ses can form trans Watson-Crick/Watson-Crick (tWW) 
base pairing with their neutral as well as hemi proto- 
nated forms, possibly both, giving rise to the i-motif 
DNA. However, both C(+):C tWW and C:C tWW, are 
counted as one type among 156 possible types. 


Classification based on isostericity 


Although significant differences are there between 
structures of non-canonical base pairs belonging to dif- 
ferent geometric families, some base pairs within the 
same geometric family have been found to substitute 
each other without disrupting the overall structure. 
These base pairs are called isosteric base pairs. Isosteric 
base pairs always belong to same geometric families, 
but all the base pairs in a particular geometric family are 
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not always isosteric. Two base pairs are called isosteric 
if they meet the following three criteria: (i) The C1'-C1' 
distances should be similar; (ii) the paired bases should 
be related by the similar rotation in 3D space; and (iii) H- 
bonds formation should occur between equivalent base 
positions.2°G4 A detailed approach towards quantify- 
ing isostericity, in terms of an IsoDiscrepancy Index 
(IDI), which can facilitate reliable prediction regarding 
which base pair substitutions can potentially occur in 
conserved motifs, was formulated by Neocles Leontis, 
Craig Zirbel and Eric Westhof.2) Based on IDI values 
and available base pair structural data, the group main- 
tains a curated online base pair catalogue and an up- 
dated set of Isostericity Matrices (IM) corresponding to 
each of the 12 geometric families. Using this resource, 
one can comprehensively classify different types of ca- 
nonical and non-canonical base pairs in terms of their 
positions in the lsostericity Matrices. This approach, for 
example, indicates that the four base pair types: A:U 
cWW, U:A cWW, G:C cWW and C:G cWW are isosteric 
to each other. Thus, as also confirmed by detailed se- 
quence comparisons, double mutations altering A:U 
cWW to U:A cWW or even to G:C cWW may not disturb 
the structure, and, unless stability issues are involved, 
the function of the related RNA. It was also found that 
the wobble G:U cWW base pair is not really isosteric to 
U:G cWW base pair, indicating that such double muta- 
tions may significantly affect the functioning of the cor- 
responding RNA. On the other hand, some of the base 
pairs which are stabilized involving Sugar edge of the 
bases are mutually isosteric. 


Classification based on local strand direc- 
tion 


It may be noted here that because of the geometric re- 
lationship of the bases with the sugar phosphate back- 
bone, these 12 geometric families of base pairs are as- 
sociated with two possible local strand orientations, 
namely parallel and antiparallel. For the 6 families with 
edge combinations involving Watson-Crick and Sugar 
edges, W:W, W:S and S:S, cis and trans families are re- 
spectively associated with antiparallel and parallel 5'to 
3' local strand direction (Table 1). Introduction of the 
Hoogsteen edge, as one of the partners in the combi- 
nation, causes an inversion in the relationship. Thus, 
for W:H and H:S, cis and trans respectively correspond 
to parallel and antiparallel local strand orientation. As 
expected, when both the edges are H, a double inver- 
sion is observed, and H:H cis and trans correspond re- 
spectively to antiparallel and parallel local strand ori- 
entations. The annotation of local strand orientation in 
terms of parallel and antiparallel directions helps to 
understand which faces of the individual bases can be 
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seen for a given base pair from the 5’- or the 3’ sides 
(Table 1). This annotation also helps in classifying the 
12 geometries into two groups of 6 each, where the 
geometries can potentially interconvert within each 
group, by in-plane relative rotation of the bases. How- 
ever, one should note that the above theory is applica- 
ble only when the glycosidic torsion angles of both the 
nucleotide residues are anti. Notably, crystallographic 
observations"! and energetic®! considerations indi- 
cate that syn glycosidic torsions are also quite possi- 
ble. Hence the above classification of parallel or anti- 
parallel nature of strand directions, by itself, does not 
always provide the complete understanding. 


Various functional RNA molecules are stabilized, in 
their specific folded pattern, by both canonical as well 
as non-canonical base pairs. Most tRNA molecules, for 
example, are known to have four short double helical 
segments, giving rise to a cloverleaf like two-dimen- 
sional structure. The three-dimensional structure of 
tRNA, however, takes an L-shape. As shown in Figure 5, 
this is mediated by several non-canonical base pairs and 
base triplets. The D-loop and TwC loop are held to- 
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gether by several such base pairs. While it is not possi- 
ble to include here the complete range of non-canonical 
base pair varieties, some of the frequently occurring 
representatives are shown in Figure 1. Interested read- 
ers are encouraged to browse through different web- 
sites such as NDB,”°! RNABPDB,?71 RNABP COGEST,#®! 
etc., to get a better understanding. 


It may be noted that the above scheme is valid for nat- 
urally occurring nucleotide bases. However, there are 
plenty of examples of post-transcriptional chemical 
modifications of the bases, many of which are seen in 
tRNAs or ribosomes. It may be important to understand 
their structural features also.°71(491 


Figure 6 | (a) Cloverleaf model of tRNAPhe (picture created by VARNA[35] for PDB: 1EHZ) and (b) A typical base triplet involving 


residues 9, 12 and 23 of the same tRNA 
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Identification of Non-canonical 
Base-pairs 


In case of double helical DNA, identification of base 
pairs is quite trivial using molecular visualizers such as 
VMD, RasMol, PyMOL etc. It is, however, not so simple 
for single stranded folded functional RNA mole- 
cules. Several algorithms have been implemented in 
software tools for the automated detection of base 
pairs in RNA structures solved by X-ray crystallography, 
NMR or other methods. Essentially the programs de- 
tect hydrogen bonds between two bases, and ensure 
their (near) planar orientation, before reporting that 
they constitute a base pair. Since most of the structures 
of RNA, available in public domain, are solved by X-ray 
crystallography, the positions of hydrogen atoms are 
rarely reported. Hence, detection of hydrogen bond be- 
comes a non-trivial job. 


The DSSR algorithm!” by Lu and Wilma K. Olson con- 
siders two bases to be paired when they detect one or 
more hydrogen bond(/s) between the bases, by actually 
modeling the positions of the hydrogen atoms, and by 
ensuring the perpendiculars to the two bases being 
nearly parallel to each other. The positions of the hydro- 
gen atoms can be deduced by converting Internal Coor- 
dinates (bond length, bond angle and torsion angle) 
along with positions of precursor atoms, such as amino 
group nitrogen atoms and those bonded to the nitrogen 
or Z-matrix to external Cartesian Coordinates. The base 
pairs identified by this method are listed in NDB"! and 
FR3D"3! databases. 


A unique way of identification of base pairs in RNA was 
incorporated in MC-Annotate!! by Francois Major. In 
this algorithm they make use of the positions of the hy- 
drogen atoms as well as lone-pair electrons using suita- 
ble molecular mechanics/dynamics force-fields*! and 
derive hydrogen bond formation probabilities for them. 
The final identifications of base pairs are done based on 
these probabilities and approach of hydrogen atoms to 
lone-pairs electrons of nitrogen or oxygen. This method 
also attempted to classify the base pair nomenclature 
with additional information of each interacting edge, 
such as Ws indicating the sugar edge corner of the Wat- 
son-Crick edge, Wh representing the Hoogsteen edge 
corner of Watson-Crick edge, Bw indicating bifurcated 
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three-center hydrogen bond involving both the hydro- 
gen atoms of amino groups to form hydrogen bonds 
with a carbonyl oxygen involving both of its lone-pairs, 
etc. As claimed by the authors, this nomenclature 
scheme adds some additional features to the Leontis- 
Westhof (LW)!®! scheme and may be referred to as the 
LW+ scheme. A major advantage of this scheme lies in 
its ability to distinguish between alternative base pair- 
ing geometries, where multimodality is observed within 
anLW family. This method, however, does not consider 
the possible participation of the 2'-OH group of the ri- 
bose sugars in base pair formation. 


Another algorithm, namely BPFIND by Dhananjay 
Bhattacharyya and coworkers, '“*! demands at least two 
hydrogen bonds using two distinct sets of donors and 
acceptors atoms between the bases. This hypothesis 
driven algorithm considers distances between two pairs 
of atoms (hydrogen bond donor (D1 and D2) and accep- 
tor (A1 and A2) and four suitably chosen precursor at- 
oms(PD1, PD2, PA1, PA2) corresponding to the D's and 
A's (as shown for a representative base pair in Figure 6). 
Small values of such distances in conjunction with large 
values of the angles defined by @:(PD1—D1—A1), 
82(D1—A1—PA1), 83(PD2—D2—A2), 84(D2—A2—PA2) 
(close to 180° or m1‘) ensures two structural features 
which characterize well defined base pairs: i) the hydro- 
gen bonds are strong and linear and ii) the two bases are 
co-planar. Notably, so long as one restricts the search 
to base pairs which are stabilized by at least two distinct 
hydrogen bonds, the above algorithms, by and large, 
yield the same set of base pairs in different RNA struc- 
tures. 


Sometimes in the crystal structures it is observed that 
two closely spaced bases are oriented in such a way that 
apart from the regular hydrogen bonds two additional 
electronegative hydrogen bond acceptor atoms are 
very close to each other, which may cause electrostatic 
repulsion. The concept of protonated base pairing, im- 
plicating a possible protonation of one of these electro- 
negative, (potentially) hydrogen bond acceptor atoms 
thus converting it into a hydrogen bond donor, was in- 
troduced to explain stability of such geome- 
tries 4718114811471 Some of the NMR derived structures 
also support the protonation hypothesis, but possibly 
more rigorous studies using neutron diffraction or other 
techniques would be able to confirm it. The quality of 
the crystal structures permitting, some algorithms also 
attempted to detect water or cation mediated base pair 
formation. 
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Figure 7 | Descriptions of the hydrogen bonding atoms, along with their precursors, for a typical non-canonical base pair (as used 


by BPFIND) 


Strengths and stability of Non- 
canonical Base-pairs 


The canonical Watson-Crick base pairs, G:C and A:T/U 
as well as most of the non-canonical ones are stabilized 
by two or more (e.g. 3 in the case of G:C cWW) hydrogen 
bonds. Justifiably, a significant amount of research on 
non-canonical base pairs has been carried out towards 
bench-marking their strengths (interaction energies) 
and (geometric) stability against those of the canonical 
base pairs. It may be noted here that base pair geome- 
tries, as observed in the crystal structures, are often in- 
fluenced by several interactions present in the crystal 
environment, thus perturbing their intrinsically stable 
geometries arising out of the hydrogen bonding and re- 
lated interactions between the two bases. Therefore, in 
principle, it is possible that the observed geometries in 
some cases are intrinsically unstable, and that they are 
stabilized by other interactions provided by the envi- 
ronment. Several groups have attempted to determine 
the interaction energies in these non-canonical base 
pairs using different quantum chemistry based ap- 
proaches, such as Density Functional Theory (DFT) or 
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MP2 methods. MIDS US2ID3IS4I6166157] These methods 
were applied on suitably truncated, hydrogen-added, 
and geometry optimized models of the base (or nucle- 
oside) pairs extracted from PDB structures. Depending 
upon the optimization protocol, typically three types of 
interaction energies have been reported. In the first 
method, the base pair model geometries, isolated from 
their respective environments, are fully optimized with- 
out any constraints.©°!7J5916°l thus providing the intrin- 
sic geometries and interaction energies of the isolated 
models. This procedure, however, sometimes leads to 
optimized geometries of base pairs involving edges dif- 
ferent from initial crystal geometry. Abhijit Mitra and 
collaborators also used an additional second protocol, 
where the heavy atom (non-hydrogen) coordinates are 
retained as in the crystal geometries, optimizing only 
the positions of the added hydrogen atoms.4UO4I67] In 
the third protocol, followed mostly by Jiri Sponer and 
his group,'*?! optimization was carried out with con- 
straints on some angles and dihedrals. Given that the 
models are extracted from their respective crystal 
structures, and are isolated from their crystal environ- 
ments, the second and the third protocols provide two 
different approaches towards approximating the envi- 
ronmental effects, without explicit considerations of 


6) 


any specific environmental interactions. This has fur- 
ther been addressed in some reports by considering 
specific environmental factors, such as coordination 
with Magnesium, or even some covalent modifications 
to the bases. 5! 


All the three protocols are useful in their respective con- 
texts. Further, a comparison of the model geometries, 
obtained by the different protocols, provide an idea re- 
garding both, the stability of the corresponding base 
pair geometries, as well as regarding the probable ex- 
tent and nature of environmental influences. It was 
found that most non-canonical base pairs, having two 
or more hydrogen bonds, generally maintain the same 
hydrogen bonding pattern in the crystal and in fully op- 
timized in isolation geometries, respectively, thus indi- 
cating their intrinsic geometric stability. Interaction en- 
ergies calculated from these optimized models also in- 
dicated the energetic stability of the corresponding 
non-canonical base pairs. The previous notion that 
non-canonical base pairs are weaker than the Watson- 
Crick base pairs, was found to be incorrect. Interaction 
energies between the bases of Several base pairs, such 
as G:G tWW, G:G cWH, A:U cHW, G:A cWW, G:U cWW, 
etc., are found to be larger than that of canonical A:U 
cWW base pair.°°! 


Of course all non-canonical base pairs are not neces- 
sarily very strong or stable in terms of interaction en- 
ergy. Several base pairs have been detected on the ba- 
sis of weak hydrogen bonds involving C—H...O/N at- 
oms, where interaction energies are rather small. Fur- 
ther, geometry optimizations of some of the observed 
base pairs, in particular, but not limited to those involv- 
ing weak hydrogen bonds, or those stabilized by single 
hydrogen bonds, were found to adopt alternate geom- 
etries,!7157] thus indicating their intrinsic lack of geo- 
metric stability. These alteration of hydrogen bonding 
schemes, giving rise to changes in base pairing family 
upon free optimization, may have some functional im- 
plication in RNA, such as their action as conformational 
switch. Accordingly, as mentioned above in the 
Sponer’s protocol, there have been some attempts to 
restrain the experimentally observed geometry while 
carrying out geometry optimization"?! for interaction 
energy calculations. Interestingly, in several cases, in- 
teraction energies calculated for these ‘away from in- 
trinsically stable’ geometries also indicate good ener- 
getic stability. 


Though the energetics and geometric stabilities of dif- 
ferent non-canonical base pairs do not show any gener- 
alized correlations, analysis of several databases, such 
as RNABPDB and RNABP COGEST, which catalogue 
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structural and energetic features of some of the ob- 
served base pair and their stacks, reveal some interest- 
ing general trends. 


For example, geometry optimizations of several base 
pairs involving 2'-OH group of sugar residue resulted in 
significant alterations from their initial geometry. This 
is possibly due to flexibility of the sugar puckers and gly- 
cosidic torsions. The significantly high interaction ener- 
gies of protonated base pairs, despite the high energy 
cost of base protonation, also deserve a special mention 
in this context. This can mostly be attributed to the ad- 
ditional charge-induced dipole interactions which are 
associated with protonated base pairs. 


Structural features 


Buckle Open 


Propeller 


WikiJournal of Science, 2023, 6(1):2 
doi: 10.15347/wjs/2023.002 
Encyclopedic Review Article 


MN 


Stretch 


> 


Shear 


WQD> 


Stagger 


Figure 8 | IUPAC recommended Intra Base Pair parameters used to describe geometry of Watson-Crick or Non-Canonical base 


pair 


Structural features of a base-pair, formed by two planar 
rigid units, can be quantified, using six parameters — 
three translational and three rotational. IUPAC recom- 
mended parameters are Propeller, Buckle, Open Angle, 
Stagger, Shear and Stretch (Figure 8).5°! Brief descrip- 
tion of these in the context of DNA double helical struc- 
ture can be found in the Wiki page on nucleic acid dou- 
ble helix. There are several publicly available software, 
such as Curves!"! by Richard Lavery, 3DNA‘ by Wilma 
Olson, NUPARM"1l®3] by Manju Bansal, etc., which may 
be used to calculate these parameters. While the first 
two calculate the parameters of canonical and non-ca- 
nonical base-pairs relative to the standard canonical 
Watson-Crick base pairs geometry, the NUPARM algo- 
rithm calculates in absolute terms using base pairing 
edge specific axis system. Hence, for most non-canoni- 
cal base-pairs, which involve non-Watson-Crick edges, 
some of the parameters (Open, Shear and Stretch) cal- 
culated by Curves or 3DNA are usually large even in 
their respective intrinsically most stable geome- 
tries. On the other hand, the values provided by 
NUPARM indicate the quality of hydrogen bonding and 
planarity of the two bases in a more realistic fashion. 
Thus, the NUPARM Stretch values, indicating separa- 
tion of the two bases of a base pair, and which depend 
on optimal hydrogen bonding distances, are always 
around 3A. Some other general trends observed in the 
values of the above parameters may be of interest to 
note. Most of the cis base pairs are seen to have Propel- 
ler values around -10° and small values of Buckle and 
Stagger. The Open and Shear values often depend on 
positions of the hydrogen bonding atoms. As for exam- 
ple, GU cWW wobble base pairs have Shear value 
around -2.2A while GC or AU cWW base pairs have 
Shear values around zero. The Open values for most 
base pairs are close to zero but the values are often ra- 
ther large for those involving 2’-OH group of sugar in 
the NUPARM derived parameter set. The trans base 
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pairs, however, do not show any systematic trend in 
their Propeller values. 


Roles 


RNA of Non-canonical Base-pairs in RNA 


The structural hierarchy in RNA is usually described in 
terms of astem-loop 2D secondary structure, which fur- 
ther folds to form its 3D tertiary structure, stabilized by 
what are referred to as long range tertiary contacts. 
Most often the non-canonical base pairs are involved in 
those tertiary contacts or extra-stem base pairs. For ex- 
ample, some of the non-canonical base pairs 
intRNA appear between the D-stem and TwC loops 
(Figure 5), which are close in the three-dimensional 
structure. Such base pairing interactions give stability 
to the L-shaped structure of tRNA. In this region, some 
base pairs are found to be additionally hydrogen 
bonded to a third base. Thus, as shown in Figure 6, the 
23rd residue is simultaneously paired to 9thand 
12th residues, together forming a base triple, the small- 
est member of the class of higher order multiplets. 


Multiplets 


One base, in addition to forming proper planar base 
pairing with a second base, can often participate in base 
pair formation with a third base forming a base triple. 
One such classic example is in formation of DNA triple 
helix, where two bases of two antiparallel strands form 
consecutive Watson-Crick base pairs in a double helix 
and a base of a third strand form Hoogsteen base pair- 
ing with the purine bases of the Watson-Crick base 
pairs. Many different types of base triples have been re- 
ported in the available RNA structures and have been 
elegantly classified in the literature.) Multiplets are 
however not limited to triplet formation. Four bases 
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giving rise to a base quartet is now well documented in 
the structure of the G-quadruplex (Figure 3) character- 
istically found in the telomere. Here four Guanine resi- 
dues pair up within themselves in a cyclic form involving 
Watson-Crick/Hoogsteen cis (cWH) base _ pairing 
scheme and each of the Guanine bases are found to be 
respectively interact with two other guanine bases. 
Three to four such base G-quadruplexes stack on top of 
the other to form a four stranded DNA structure. In ad- 
dition to such a cyclic topology, several other topolo- 
gies of base:base pairings are possible for higher order 
multiplets such as quartets, pentets etc.!°°! 


Double helical regions 


Non-canonical base pairs quite frequently appear 
within double helical regions of RNA. The G:U cWW 
non-canonical base pairs are seen very frequently 
within double helical regions as this base pair is nearly 
isosteric to the other canonical ones.!6°! Due to compli- 
cation of strand direction, as elaborated in the Classifi- 
cation section (Table 1), not all types of non-canonical 
base pairs can be accommodated within double helical 
regions with anti glycosidic torsion angles. However, 
many non-canonical base pairs, e.g. A:G tHS (trans 
Hoogsteen/Sugar edge) or A:U tHW (trans 
Hoogsteen/Watson-Crick), A:G cWW, etc., are often 
seen within double helical regions giving rise to sym- 
metric internal loop like motifs. Attempts have been 
made to classify all such situations where two base pairs 
(canonical or non-canonical) stack in anti-parallel sense 
possibly giving rise to double helical regions in RNA 
structures. These base pairs are quite stable, and they 
are able to maintain the helical property quite well. The 
backbone torsion angles around these residues are also 
generally within reasonable limits: C3'-endo sugar 
pucker with anti glycosidic torsion, a/y torsion angles 
around -60°/60°, B/e torsion angles around 180°. 
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Recurrent structural motifs 
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Figure 9 | An example of higher order structure (C-loop) in RNA by formation of base triples using non-canonical base pair 
from PDB: 1KOG (a) by schematic representation and (b) by molecular visualizer 


Non-canonical base pairs often appear in different 
structural motifs, including pseudoknots, with their 
special hydrogen bonding features. Structural features 
of these recurrent motifs have been archived in search- 
able databases, such as, FR3D!°! and RNA FRABASE.!©5! 
Also, several of these motifs can be identified in a given 
query PDB file by the NASSAM"®! web-server. They are 
most frequently detected at the termini of double heli- 
cal segment acting as capping residues, often preced- 
ing hairpin loops. The most frequently found non-ca- 
nonical base pair, namely G:A tSH, is an integral part of 
GNRA tetraloops, where N can be any nucleotide resi- 
due and R is a purine residue. This motif shows some 
amount of flexibility and alterations of structural fea- 
tures depending on whether the Guanine and Adenine 
are paired or not. Several other types of tetraloops mo- 
tifs, such as UNCG, YNMG, GNAC, CUYG, (where Y 
stands for pyrimidine and M is either Adenine or Cyto- 
sine) etc., have been found in available RNA structures. 
However, these do not generally show involvement of 
non-canonical base pairing. In addition to these com- 
mon hairpin motifs, where the loop residues largely re- 
main unpaired, there are also a few motifs where the 
loop residues make extensive interactions between 
themselves or with other residues external to the loop. 
A common example is the C-loop motif,” where the 
bulging loop residues make non-canonical base pairing 
with the bases of double helical regions forming non- 
canonical base pairing (Figure 9). The extra base pairs 
in these cases give rise to additional stabilization to the 
composite double helix containing motif. Non-canoni 
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cal base pairs are also involved in receptor-loop interac- 
tion, such as in T-loop motif.'”"! Another interesting ex- 
ample of the involvement of non-canonical base pairs in 
recurrent contexts was detected as the GAAA receptor 
motif, which consists of A:A cHS base pair followed by 
U:A tWH base pair stacked on both sides by G:C cWW 
base pairs. Here we have successive non-canonical base 
pairs within an antiparallel RNA double helical do- 
main. Similarly there is an A:A cSH base pair involving 
two consecutive residues in this motif. Such pairing be- 
tween consecutive residues, which is also termed as a 
dinucleotide platform motif, is quite commonly ob- 
served. They appear in many RNA structures and the 
pairing can also be between other bases. Such dinucle- 
otide platform was reported in A:A, A:G, A:U, G:A, G:U 
base pairs belonging to the cSH class and also in A:A 
cHH base pairs. These motifs can alter the strand direc- 
tion within a double helix by formation of kinks. Such 
dinucleotide platform along with triplet formation is 
also an integral component of the Sarcin-ricin motif.!”7) 


Modeling of RNA structures containing 
Non-canonical base pairs: 


Prediction of biomolecular structure from sequence 
alone is a long term goal of scientists working in the 
fields of bioinformatics, computational chemistry, sta- 
tistical physics as well as in computer science. Predic- 
tion of protein structures from amino acid sequence by 
methods like homology modeling, comparative model- 
ing, threading, etc were largely successful due to avail- 
ability of about 1200 unique protein folds. Inspired by 
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the protein experience, there are now several ap- 
proaches towards predicting RNA structures, albeit 
with varying degrees of success. Any comprehensive 
discussion on RNA modeling is beyond the scope of this 
article, and one may browse the “List of RNA structure 
prediction software” for getting an idea about the grow- 
ing interest in this area. Nevertheless, some general ob- 
servations, as summarized below, may be useful in the 
current context. 


It can be seen that most of the approaches are essen- 
tially limited to the prediction of RNA 2D stem-loop 
structure, also referred to as RNA secondary structure. 
For example, minimum computed free energy predic- 
tion of double helical regions of RNA sequences from 
the energy of base pairing and stacking interactions, es- 
sentially computationally derived from experimental 
thermodynamic data, was initially introduced by Ruth 
Nussinov and later by Michael Zuker. This, in turn, has 
inspired several related modified algorithms, including 
data on neighboring group interactions etc.!°*! Most of 
these approaches, however, mainly consider data on 
canonical base pairing, with only a few which also con- 
sider thermodynamic data on Hoogsteen base pairs. 
Thus, in addition to the computational costs and com- 
plications associated with the identification of pseu- 
doknots, all these methods also suffer from the draw- 
back associated with the paucity of experimental data 
on non-canonical base pairs. 


However, there are also several approaches which at- 
tempt at predicting the tertiary 3D structure corre- 
sponding to given predicted 2D structures. There are 
also a few involving 3D fragment based modeling,!”*! 
which are getting further facilitated with the increasing 
availability of motif wise curated RNA 3D structure 
data. It is also, encouraging to note that there are 
now some software and servers, such as MC-Fold!”4), 
RNAPDBee,!”*! RNAWolfe,’’*! etc. available for explor- 
ing non-canonical base pairing in RNA 3D structures. 
Some of these methods depend on structural database 
of RNA, such as FRABASE,!®! to obtain 3D coordinates 
of motifs containing non-canonical base pairs and stitch 
the information with 3D structure of double helices con- 
taining canonical base pairs. 


It may be relevant in this context, to mention about the 
approach towards 3D model building of double helical 
regions with both canonical and non-canonical base 
pairs used in 3DNA by Olson or in RNAHelix!”7 by 
Bhattacharyya and Bansal. These software suites use 
base pair parameters to generate 3D coordinates of in- 
dividual dinucleotide steps, which can be extended to 
model double helices of arbitrary lengths with canonical 
or non-canonical base pairs. 
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The above mentioned methods attempt to model a sin- 
gle structure (2D or 3D) of a given RNA sequence. How- 
ever, growing evidences indicate that a given RNA se- 
quence can adopt ensemble of structures and possibly 
interconvert between them.!”*] This ensembles obvi- 
ously adopt different base pairing patterns between dif- 
ferent sets of residues.'”7! Thus, there are enough point- 
ers to suggest that the focus on modeling single struc- 
tures appears to have been a bottleneck for accurate 
modeling of RNA structure. 


The theoretical prediction of RNA 2D structure and con- 
sequently 3D structure can also be confirmed by differ- 
ent chemical probing methods. One of the latest such 
tools is SHAPE (Selective 2’-hydroxyl acylation ana- 
lyzed by primer extension), and SHAPE-Directed RNA 
Secondary Structure Prediction®”! appears to be most 
promising. Coupled with mutational profiling, ensem- 
bles of RNA structures, which often include non-canon- 
ical base pairing, can be experimentally studied using 
the SHAPE-MaP approach. One of the ways ahead 
today appears to be an integration of Zuker’s minimum 
free energy approach with experimentally derived 
SHAPE data, including simulated SHAPE data as out- 
lined in Montaseri et al. (2016)! and Spasic et al 
(2017). 83! 


Conclusion 


Hydrogen bond mediated interactions between nucle- 
otide bases, leading to base-pair formation, constitute 
one of the most important classes of attractive interac- 
tions which shape the structure, dynamics and function 
of nucleic acids. With the determination of the structure 
of double stranded DNA molecules fueling the develop- 
ment and phenomenal growth in the area of molecular 
biology, for a long time, nucleic acid research was fo- 
cused primarily around the canonical G:C and A: T/U ca- 
nonical base pairs. However, even in DNA, other types 
of base pairings, involving different geometries and 
base pairing partners, have been drawing attention in 
the context of structural and functional diversity. Oc- 
currence of these non-canonical base pairs are far more 
abundant in RNA, where a single strand folds on to it- 
self, often without the possibility of complementary ca- 
nonical base pairs to stabilize the folds. The picture that 
emerges from ongoing research in the context of di- 
verse structure, dynamics and function of RNA, is that 
the diversity may be rationalized in terms of the struc- 
ture, dynamics and stabilities of over more than 100 
types of base pairs, including non-canonical base pairs. 
The role of G: U W: W cis base pairs in the context of the 
Wobble hypothesis, or the Hoogsteen base pairing in 
the context of triple helices and G quartet formation 


were initial indicators. Most of the tertiary interactions 
shaping the complex folding and functions of 3D RNA 
are mediated through non-canonical base pairs. What is 
particularly notable is that non-canonical base pairs are 
capable of creating appropriate localized distortions to 
provide functionally important structural variations, not 
only in RNA, but even in double stranded DNA. This be- 
comes even more significant in the context of non-ca- 
nonical base pairs, occurring in the A-type double 
stranded regions of functional RNAs, which play an im- 
portant role in molecular recognition of base sequence 
by locally distorting the otherwise inaccessible major 
groove. Thus, the field of non-canonical base pairing is 
still quite open for scientific contributions from differ- 
ent directions. In particular, a comprehensive charac- 
terization of non-canonical base pairs will have a far- 
reaching impact on RNA biotechnology, both, in terms 
of prediction of structure as well as in terms of enriching 
our molecular level understanding of the functioning 
of non (protein) coding RNA. 
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